The efficient generation of pronunciation dictionaries: machine learning factors during bootstrapping
نویسندگان
چکیده
Several factors affect the efficiency of bootstrapping approaches to the generation of pronunciation dictionaries. We focus on factors related to the underlying rule-extraction algorithms, and demonstrate variants of the Dynamically Expanding Context algorithm, which are beneficial for this application. In particular, we show that continuous updating of the learned rules, coupled with a new approach to grapheme-to-phoneme alignment and a sliding-window approach to choosing the context window, leads to an efficient and accurate bootstrapping mechanism.
منابع مشابه
The efficient generation of pronunciation dictionaries: human factors during bootstrapping
Bootstrapping techniques have significant potential for the efficient generation of linguistic resources such as electronic pronunciation dictionaries. We describe a system and an approach to bootstrapping for the development of such dictionaries, and report on experiments conducted to investigate the efficiency and effectiveness of the system, focusing on the human factors that influence the p...
متن کاملBootstrapping pronunciation dictionaries: practical issues
Bootstrapping techniques are an efficient way to develop electronic pronunciation dictionaries [1, 2], but require fast system response to be practical for medium-to-large lexicons. In addition, user errors are inevitable during this process, and it is useful if automatic means can be used to assist in the search for such errors. We describe how the Default&Refine grapheme-tophoneme rule extrac...
متن کاملAutomatic Learning and Optimization of Pronunciation Dictionaries
Pronunciation dictionaries are the interface between orthographic and phonetic representation of the speech signal and are thereby a substantial component of speech recognition systems. In many systems simple canonical pronunciation forms are used within the dictionary. They represent the “correct” pronunciation as they are found in lexicons and neither contain the most frequent pronunciation n...
متن کاملLearning Pronunciation Dictionaries: Language Complexity and Word Selection Strategies
The speed with which pronunciation dictionaries can be bootstrapped depends on the efficiency of learning algorithms and on the ordering of words presented to the user. This paper presents an active-learning word selection strategy that is mindful of human limitations. Learning rates approach that of an oracle system that knows the final LTS rule set.
متن کاملLearning pronunciation dictionary from speech data
In this paper an algorithm and rst results from our investigations in automatically learning pronunciation variations from speech data are presented. Pronunciation dictionaries establish an important feature in state-of-the-art speech recognition systems. In most systems only simple dictionaries containing the canonical pronunciation forms are implemented. However, for a good recognition perfor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004